Add sort integration benchmark #13306

2010YOUY01 · 2024-11-08T10:12:06Z

Which issue does this PR close?

Closes #.

Rationale for this change

I noticed there is no benchmark to test sorting the whole relational table: existing sort benchmark is only for a single SortExec, and this can't test how would end-to-end large sort query scale to multiple CPU cores. With integration test, it is possible to see the combined performance of local sort on small batches and final step multi way sort-preserving merge.

The benchmark includes 10 queries to sort the entire lineitem table in TPCH dataset, with different characteristics. For example: with different number of sort key/ payload columns, different sort key types and cardinality, etc. Also it is easy to add more benchmark queries.
More details see sort_integration.rs

What changes are included in this PR?

Added a single benchmark binary for benchmark. It can be executed with:

# Under benchmarks/
./bench.sh run sort_integration

Q1 iteration 0 took 211.0 ms and returned 6001215 rows
Q1 iteration 1 took 186.7 ms and returned 6001215 rows
Q1 iteration 2 took 184.2 ms and returned 6001215 rows
Q1 iteration 3 took 185.4 ms and returned 6001215 rows
Q1 iteration 4 took 189.4 ms and returned 6001215 rows
Q1 avg time: 191.36 ms
Q2 iteration 0 took 156.9 ms and returned 6001215 rows
Q2 iteration 1 took 163.4 ms and returned 6001215 rows
Q2 iteration 2 took 166.2 ms and returned 6001215 rows
Q2 iteration 3 took 162.5 ms and returned 6001215 rows
Q2 iteration 4 took 169.5 ms and returned 6001215 rows
Q2 avg time: 163.70 ms
Q3 iteration 0 took 806.1 ms and returned 6001215 rows
Q3 iteration 1 took 812.8 ms and returned 6001215 rows
......

Query run results comparing sorting lineitem table with scaling factor 1 and 5

┏━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query        ┃ sort-lineitem-sf1 ┃ sort-lineitem-sf5 ┃        Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ Q1           │          186.49ms │         1244.18ms │  6.67x slower │
│ Q2           │          156.91ms │         1000.38ms │  6.38x slower │
│ Q3           │          804.57ms │         4682.39ms │  5.82x slower │
│ Q4           │          241.32ms │         1464.02ms │  6.07x slower │
│ Q5           │          407.56ms │         2037.05ms │  5.00x slower │
│ Q6           │          441.52ms │         2193.89ms │  4.97x slower │
│ Q7           │          786.11ms │         7000.62ms │  8.91x slower │
│ Q8           │          535.87ms │         2835.62ms │  5.29x slower │
│ Q9           │          532.31ms │         2957.57ms │  5.56x slower │
│ Q10          │          841.96ms │         9289.74ms │ 11.03x slower │
└──────────────┴───────────────────┴───────────────────┴───────────────┘

Are these changes tested?

Are there any user-facing changes?

alamb

Thank you @2010YOUY01 -- I am sorry for the delay in reviewing

I found this PR really well commented and documented. ❤️

What would you think about adding this benchmark to the dfbench benchmark binary rather than creating an entirely new one?

Perhaps it can be SortIntegration or SortTPCH for a name?

https://github.com/apache/datafusion/blob/77f330c6a2b26f2d1d4d4bf11d456fad466316b4/benchmarks/src/bin/dfbench.rs#L45-L44

Each different binary has substantial overhead when building (and it is a different target to build, etc) so reducing the total number of binaries I think helps the overall CI time

alamb · 2024-11-13T13:54:02Z

benchmarks/src/bin/sort_integration.rs

+    /// - Thin variant: `l_partkey` column with `BIGINT` type (1 column)
+    /// - Wide variant: all columns except for possible key columns (12 columns)
+    const SORT_QUERIES: [&'static str; 10] = [
+        // Q1: 1 sort key (type: INTEGER, cardinality: 7) + 1 payload column


thank you for your annotations here. 👍

2010YOUY01 · 2024-11-15T04:48:41Z

@alamb Thank you, this makes sense to me. Updated.

alamb · 2024-11-15T20:06:47Z

Thank you very much @2010YOUY01 -- very cool

2010YOUY01 added 2 commits November 8, 2024 17:50

Add sort integration benchmark

ef7d942

clippy

199bdae

alamb approved these changes Nov 13, 2024

View reviewed changes

2010YOUY01 marked this pull request as draft November 14, 2024 13:22

2010YOUY01 marked this pull request as ready for review November 15, 2024 04:47

review

349ba55

2010YOUY01 force-pushed the sort-bench branch from affe136 to 349ba55 Compare November 15, 2024 05:53

alamb merged commit 5ea1d31 into apache:main Nov 15, 2024
25 checks passed

2010YOUY01 deleted the sort-bench branch November 16, 2024 03:10

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add sort integration benchmark #13306

Add sort integration benchmark #13306

2010YOUY01 commented Nov 8, 2024

alamb left a comment

alamb Nov 13, 2024

2010YOUY01 commented Nov 15, 2024

alamb commented Nov 15, 2024

Add sort integration benchmark #13306

Add sort integration benchmark #13306

Conversation

2010YOUY01 commented Nov 8, 2024

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

alamb left a comment

Choose a reason for hiding this comment

alamb Nov 13, 2024

Choose a reason for hiding this comment

2010YOUY01 commented Nov 15, 2024

alamb commented Nov 15, 2024